Skip to content

Conversation

SingleAccretion
Copy link

@SingleAccretion SingleAccretion commented Sep 22, 2025

See the commit message for details on the ABI changes. The main idea is to make our PI transition frame "zero-sized" by defining it to be equal to the shadow stack top.

Diffs (WasmDebugging, browser):

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 647755
Total bytes of diff: 646660
Total bytes of delta: -1095 (-0.17% % of base)
Average relative delta: -16.24%
    diff is an improvement
    average relative diff is an improvement

Top method regressions (percentages):
          21 (161.54% of base) : 1001.dasm - Thread::Destroy()
          45 (56.96% of base) : 1000.dasm - Thread::Construct()
          27 (20.77% of base) : 1006.dasm - RhpReversePInvoke
          15 (13.27% of base) : 1002.dasm - Thread::GcScanRoots(void (*)(Object**, ScanContext*, unsigned int), ScanContext*)
           7 ( 6.80% of base) : 1003.dasm - RhpWaitForGC2
           1 ( 2.63% of base) : 1009.dasm - RhpPInvokeReturn

Top methods only present in diff:
         211 (     ∞ of base) : 1053.dasm - RhpReversePInvokeAndPushSparseVirtualUnwindFrame
         184 (     ∞ of base) : 1052.dasm - Thread::ReversePInvokeAttachOrTrapThread_Wasm(unsigned long)
          33 (     ∞ of base) : 1054.dasm - RhpReversePInvokeReturnAndPopSparseVirtualUnwindFrame
          22 (     ∞ of base) : 1051.dasm - Thread::GetTransitionFrame()

Top method improvements (percentages):
         -57 (-78.08% of base) : 1024.dasm - S_P_CoreLib_System_Runtime_GCStress__Initialize
         -22 (-62.86% of base) : 1008.dasm - RhpPInvoke
         -45 (-55.56% of base) : 1047.dasm - S_P_CoreLib_System_Threading_Thread__LongSpinWait
         -45 (-55.56% of base) : 1033.dasm - S_P_CoreLib_Interop_Sys__LowLevelMonitor_Release
         -45 (-55.56% of base) : 1017.dasm - S_P_CoreLib_Interop_Sys__Free
         -45 (-52.94% of base) : 1016.dasm - S_P_CoreLib_System_Runtime_InternalCalls__RhGetGcTotalMemory
         -43 (-51.81% of base) : 1014.dasm - S_P_CoreLib_System_Runtime_InternalCalls__RhEndNoGCRegion
         -43 (-50.00% of base) : 1043.dasm - S_P_CoreLib_System_Threading_Thread__Yield
         -45 (-48.91% of base) : 1018.dasm - S_P_CoreLib_System_Runtime_InternalCalls__RhCollect
         -45 (-42.86% of base) : 1015.dasm - S_P_CoreLib_System_Runtime_InternalCalls__RhStartNoGCRegion
         -45 (-41.28% of base) : 1042.dasm - S_P_CoreLib_System_Runtime_InteropServices_NativeMemory__Alloc_0
         -43 (-40.19% of base) : 1037.dasm - S_P_CoreLib_Internal_Runtime_FrozenObjectHeapManager__ClrVirtualReserve
         -41 (-39.81% of base) : 1046.dasm - S_P_CoreLib_System_Buffer___ZeroMemory
         -45 (-37.50% of base) : 1013.dasm - S_P_CoreLib_Internal_Runtime_CompilerHelpers_StartupCodeHelpers__InitializeModuleFrozenObjectSegment
         -41 (-37.27% of base) : 1044.dasm - S_P_CoreLib_System_Buffer___Memmove
          -9 (-36.00% of base) : 1007.dasm - RhpReversePInvokeReturn
         -41 (-31.78% of base) : 1031.dasm - S_P_CoreLib_System_Threading_LowLevelMonitor__DisposeCore
         -41 (-28.47% of base) : 1038.dasm - S_P_CoreLib_Internal_Runtime_Augments_RuntimeAugments__InitializeStackTraceIpMap
         -51 (-24.76% of base) : 1027.dasm - S_P_CoreLib_System_Threading_LowLevelLock__SignalWaiter
         -43 (-24.57% of base) : 1035.dasm - S_P_CoreLib_System_Threading_LowLevelMonitor__Initialize

Top methods only present in base:
         -66 (-100.00% of base) : 1005.dasm - RhpGetOrInitShadowStackTop
         -15 (-100.00% of base) : 1004.dasm - RhpReversePInvokeAttachOrTrapThread2

55 total methods with Code Size differences (45 improved, 10 regressed)

Contributes to #3163.

@SingleAccretion SingleAccretion force-pushed the PI-Opt-Abi branch 3 times, most recently from 00e0915 to 61ca405 Compare September 23, 2025 21:39
@SingleAccretion SingleAccretion force-pushed the PI-Opt-Abi branch 2 times, most recently from 6761d15 to ddbc536 Compare October 9, 2025 21:39
This changes the managed ABI in the following ways:

1) The PI transition frames becomes simply the shadow stack top. This
   "frame" is zero-sized - we don't store the current thread in it,
   since the WASM TLS model allows us to elide it.

   The obvious benefit from this is that the PI path is now almost
   100% optimal: two stores and one load. We can get rid of the load
   in an ST build as well, but that's left for a future change.

2) The RPI transition frame is now always allocated at a zero offset
   and returned directly from the RPI helper. We thus elide the intermediate
   state where we already have the shadow stack, but haven't yet attached
   the thread. This brings us in line with other targets.

3) The sparse virtual unwind frame is now allocated right after
   the RPI frame, and "combined" RPI helpers introduced that both
   effect the transition and push the EH frame.

The RPI changes reduce the number of helper calls that any RPI method
needs to make from 3 to 1 (for epilogs - 2 to 1) in the sparse virtual
unwinding model, and reduce the number of intructions on the critical
path.

Benchmarks:
  Node base:
    Bench_PInvoke took               : 86 ms (8.64 ns / op)
    Bench_ReversePInvoke_Empty took  : 113 ms (11.30 ns / op)
    Bench_ReversePInvoke_WithEH took : 172 ms (17.24 ns / op)

  Node diff:
    Bench_PInvoke took               : 81 ms (8.06 ns / op)
    Bench_ReversePInvoke_Empty took  : 58 ms (5.81 ns / op)
    Bench_ReversePInvoke_WithEH took : 108 ms (10.79 ns / op)

  Wasmtime base:
    Bench_PInvoke took               : 99 ms (9.86 ns / op)
    Bench_ReversePInvoke_Empty took  : 73 ms (7.28 ns / op)
    Bench_ReversePInvoke_WithEH took : 77 ms (7.71 ns / op)

  Wasmtime diff:
    Bench_PInvoke took               : 82 ms (8.16 ns / op)
    Bench_ReversePInvoke_Empty took  : 31 ms (3.06 ns / op)
    Bench_ReversePInvoke_WithEH took : 50 ms (4.98 ns / op)
@SingleAccretion SingleAccretion marked this pull request as ready for review October 10, 2025 15:31
@SingleAccretion
Copy link
Author

@dotnet/nativeaot-llvm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant